Assessing Parameter Identifiability in Phylogenetic Models Using Data Cloning
نویسندگان
چکیده
The success of model-based methods in phylogenetics has motivated much research aimed at generating new, biologically informative models. This new computer-intensive approach to phylogenetics demands validation studies and sound measures of performance. To date there has been little practical guidance available as to when and why the parameters in a particular model can be identified reliably. Here, we illustrate how Data Cloning (DC), a recently developed methodology to compute the maximum likelihood estimates along with their asymptotic variance, can be used to diagnose structural parameter nonidentifiability (NI) and distinguish it from other parameter estimability problems, including when parameters are structurally identifiable, but are not estimable in a given data set (INE), and when parameters are identifiable, and estimable, but only weakly so (WE). The application of the DC theorem uses well-known and widely used Bayesian computational techniques. With the DC approach, practitioners can use Bayesian phylogenetics software to diagnose nonidentifiability. Theoreticians and practitioners alike now have a powerful, yet simple tool to detect nonidentifiability while investigating complex modeling scenarios, where getting closed-form expressions in a probabilistic study is complicated. Furthermore, here we also show how DC can be used as a tool to examine and eliminate the influence of the priors, in particular if the process of prior elicitation is not straightforward. Finally, when applied to phylogenetic inference, DC can be used to study at least two important statistical questions: assessing identifiability of discrete parameters, like the tree topology, and developing efficient sampling methods for computationally expensive posterior densities.
منابع مشابه
Identifiability of the Gtr+γ Model of Molecular Evolution
Inference of evolutionary trees and rates from biological sequences is commonly performed using models of character change that incorporate rate variation across sites. Though an incorrect proof of the identifiability of the GTR+Γ+I model has been published, very little has been rigorously established concerning the identifiability of the models currently in common use in data analysis. Here we...
متن کاملIdentifiability of a Markovian model of molecular evolution with Gamma-distributed rates
Inference of evolutionary trees and rates from biological sequences is commonly performed using continuous-time Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible mixture models where each site is allowed its o...
متن کاملParameter Identifiability Issues in a Latent Ma- rkov Model for Misclassified Binary Responses
Medical researchers may be interested in disease processes that are not directly observable. Imperfect diagnostic tests may be used repeatedly to monitor the condition of a patient in the absence of a gold standard. We consider parameter identifiability and estimability in a Markov model for alternating binary longitudinal responses that may be misclassified. Exactly ...
متن کاملEstimating trees from filtered data: identifiability of models for morphological phylogenetics.
As an alternative to parsimony analyses, stochastic models have been proposed (Lewis, 2001; Nylander et al., 2004) for morphological characters, so that maximum likelihood or Bayesian analyses may be used for phylogenetic inference. A key feature of these models is that they account for ascertainment bias, in that only varying, or parsimony-informative characters are observed. However, statisti...
متن کاملParameter Identifiability of Fundamental Pharmacodynamic Models
Issues of parameter identifiability of routinely used pharmacodynamics models are considered in this paper. The structural identifiability of 16 commonly applied pharmacodynamic model structures was analyzed analytically, using the input-output approach. Both fixed-effects versions (non-population, no between-subject variability) and mixed-effects versions (population, including between-subject...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 61 شماره
صفحات -
تاریخ انتشار 2012